Keen You
keen.you@yale.edu
Deep neural networks are undoubtedly successful at many computer vision tasks. However, as mentioned in class, deep neural networks struggle with tasks that require learning abstracted visual concepts, such as the same-different relations.
The figures below diaplay a few examples of such task. A human judgement for such tasks should be consistent regardless of the positions of the shape. However, for a deep CNN, the prediction may be different if we change the positions of the shapes as the generated feature representation will be different.

Here is an even more extreme example. Human judgement will be unaffected by the background, but deep CNN will be significantly affected as the images are very different, even though the abstracted visual concepts and task (find shapes and compare) are identical.

This observation raises the question: what exactly are deep CNNs learning?
Even though CNNs are inspired by the human vision system, it seems that humans and deep neural networks are learning different things: humans learn the underlying abstract visual concepts whereas deep neural networks learn a set of rich and complex features that are position and dataset specific.
In this project, I want to investigate what deep CNNs are learning for tasks that require understanding of abstracted visual concepts and what alternative approaches there are.
Particularly, I want to further explore various counting star related tasks in homework 4 q6.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
sns.set_theme()
Task: given image, predict the number of stars in the image.
# load data
x1_train = np.load('../hw/Homework4/Q6_1_x_train.npy')
y1_train = np.load('../hw/Homework4/Q6_1_y_train.npy')
x1_test = np.load('../hw/Homework4/Q6_1_x_test.npy')
y1_test = np.load('../hw/Homework4/Q6_1_y_test.npy')
x1_train.shape, y1_train.shape, x1_test.shape, y1_test.shape
((12000, 64, 64, 1), (12000, 1), (5000, 64, 64, 1), (5000, 1))
# dataset visualization
# retrieve 5 images for each class from train
from collections import defaultdict
counts = defaultdict(int)
index = 0
images = defaultdict(list)
while sum(counts.values()) != 20:
label = int(y1_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{1: [0, 1, 2, 3, 4],
2: [3000, 3001, 3002, 3003, 3004],
4: [6000, 6001, 6002, 6003, 6004],
5: [9000, 9001, 9002, 9003, 9004]})
fig, axs = plt.subplots(4, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
if index < 10:
label = index // 5 + 1
else:
label = (index + 5) // 5 + 1
image_index = images[label].pop(0)
image = x1_train[image_index]
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
In homework 4, a CNN achieves training performance of 0.1381 mse and testing performance of 0.1405. Accuracies are 0.89 and 0.82 respectively.
Visualization of the filters did not really provide much insights on what fetures the CNN is learning. Thus, I suspect that the CNN is relying on some other signals, such as the number of white pixels. I will verify the hypothesis now by
x1_test.shape
(5000, 64, 64, 1)
# remove last dimension
X_train = x1_train.reshape((12000, 64, 64))
X_test = x1_test.reshape((5000, 64, 64))
# normalize
X_train = X_train / 255.
X_test = X_test / 255.
# simple thresholding to identify object
# Extension: could use better object detection method
# but thresholding suffice for now
X_train = np.where(X_train > 0.5, 1, 0)
X_test = np.where(X_test > 0.5, 1, 0)
# Task 1: examine the number of white pixels for each class
# count number of 1's in each image
white_pixel_counts = np.apply_along_axis(lambda img: np.count_nonzero(img),
1, X_test.reshape(5000, 64 * 64))
count_df = pd.DataFrame({'label': pd.Series(y1_test.T[0]),
'white_pixel_count': pd.Series(white_pixel_counts)})
count_df.groupby('label').mean()
| white_pixel_count | |
|---|---|
| label | |
| 1.0 | 187.419 |
| 2.0 | 376.945 |
| 3.0 | 560.739 |
| 4.0 | 733.234 |
| 5.0 | 681.283 |
Seems like there is noticeable difference in the number of white pixels for different classes.
# plot mean for each class
count_df.groupby('label').mean().plot(kind='bar')
plt.show()
Seem like there is a linear relationship from 1 to 4?
count_df.groupby('label').mean().plot(kind='line')
plt.show()
# box plot for each class
count_df.boxplot(column='white_pixel_count', by='label', figsize=(12,9))
plt.show()
# scatter plot colored by label
scatter = plt.scatter(count_df['white_pixel_count'], np.zeros(len(count_df)), c=count_df['label'])
plt.legend(handles=scatter.legend_elements()[0], labels=[1,2,3,4, 5])
plt.show()
Apply the same steps to training set with missing 3
# count number of 1's in each image
white_pixel_counts = np.apply_along_axis(lambda img: np.count_nonzero(img),
1, X_train.reshape(12000, 64 * 64))
count_df = pd.DataFrame({'label': pd.Series(y1_train.T[0]),
'white_pixel_count': pd.Series(white_pixel_counts)})
count_df.groupby('label').mean()
| white_pixel_count | |
|---|---|
| label | |
| 1.0 | 189.326667 |
| 2.0 | 373.577667 |
| 4.0 | 732.096667 |
| 5.0 | 681.938667 |
# plot mean for each class
count_df.groupby('label').mean().plot(kind='bar')
plt.show()
count_df.groupby('label').mean().plot(kind='line')
plt.show()
# box plot for each class
count_df.boxplot(column='white_pixel_count', by='label', figsize=(12,9))
plt.show()
# scatter plot colored by label
scatter = plt.scatter(count_df['white_pixel_count'], np.zeros(len(count_df)), c=count_df['label'])
plt.legend(handles=scatter.legend_elements()[0], labels=[1,2,4,5])
plt.show()
Both training and testing sets exhibit the same trend. Some observations:
# Task 2: will linear models suffice?
# plot data points that we want to fit model on
plt.scatter(count_df['white_pixel_count'], count_df['label'])
plt.show()
We can see that the range of 5 stars is a subset of 4 stars, this will be a problem for linear classifiers based on only number of white pixels.
However, I will train a linear regression model on bag-of-features as a baseline performance to compare to CNN performance.
from sklearn.linear_model import LinearRegression
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error, accuracy_score
X_train = x1_train.reshape((12000, 64 * 64)) / 255.
y_train = y1_train.T[0]
X_test = x1_test.reshape((5000, 64 * 64)) / 255.
y_test = y1_test.T[0]
# normalize
scaler = preprocessing.StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
scaler = preprocessing.StandardScaler().fit(X_test)
X_test = scaler.transform(X_test)
reg = LinearRegression().fit(X_train, y_train)
train_pred = reg.predict(X_train)
train_mse = mean_squared_error(y_train, train_pred)
train_label_pred = np.array(np.round(train_pred), dtype=int)
train_accuracy = accuracy_score(y_train, train_label_pred)
print(f"Train mse: {train_mse}")
print(f"Train accuracy: {train_accuracy}")
Train mse: 0.15853926129306756 Train accuracy: 0.7915833333333333
test_pred = reg.predict(X_test)
test_mse = mean_squared_error(y_test, test_pred)
test_label_pred = np.array(np.round(test_pred), dtype=int)
test_accuracy = accuracy_score(y_test, test_label_pred)
print(f"Test mse: {test_mse}")
print(f"Test accuracy: {test_accuracy}")
Test mse: 0.40081380210632944 Test accuracy: 0.5724
The performances are significantly worse compared to CNN, which is expected. However, the performance is far better than random (20%). This suggests that without any concept of the shape star, the classifier is already doing decent at this task, let alone incorporating other superficial features and even features that are not observable for human eyes.
In addition to number of white pixels, other very likely features that CNN can make use of include positions of black pixels. For example, for images with 1 star, roughly three quadrants of the image are black, and none other class exhbitis this property.
In general, for this task, there are many computational features that could be used to make predictions, rather than actually learning what a star shape is.
Modified task:
For humans, a star shape is a star shape regardless of its size ie. translates to the same abstract visual concept. However, this will induce different feature representations in CNN. Is it going to affect predictions? We will find out now.
One crude way of doing this is to randomly zoom into some of the images, such that stars belonging to the same class do not always have the same size (number of white pixels). It is possible that part of some stars lie outside the images, but for a human, it should not affect judgement as one can still count the number of stars (if assuming all objects are stars).
import cv2
def zoom_in(img, zoom_factor=1.2):
y_size = img.shape[0]
x_size = img.shape[1]
# define new boundaries
x1 = int(0.5*x_size*(1-1/zoom_factor))
x2 = int(x_size-0.5*x_size*(1-1/zoom_factor))
y1 = int(0.5*y_size*(1-1/zoom_factor))
y2 = int(y_size-0.5*y_size*(1-1/zoom_factor))
# crop
img_cropped = img[y1:y2,x1:x2]
# scale
return cv2.resize(img_cropped, None, fx=zoom_factor, fy=zoom_factor)
# a few examples
# one star
img = zoom_in(x1_train[100])
plt.imshow(img, cmap='gray')
plt.grid(None)
plt.show()
# two stars
img = zoom_in(x1_train[3436])
plt.imshow(img, cmap='gray')
plt.grid(None)
plt.show()
# four stars
img = zoom_in(x1_train[6160])
plt.imshow(img, cmap='gray')
plt.grid(None)
plt.show()
# randomly zoom in images
# 3000 per class
def batch_zoom(X_data, y, n_per_class=2000):
X = np.array(X_data, copy=True)
labels = np.unique(y)
# group X by class
for label in labels:
current_X = X[y.T[0] == label]
# randomly select indices to zoom in
zoom_indices = np.random.choice(range(len(X)), n_per_class)
for index in zoom_indices:
X[index] = zoom_in(X[index])
return X
X_train = batch_zoom(x1_train.reshape(len(x1_train), 64, 64), y1_train)
X_test = batch_zoom(x1_test.reshape(len(x1_test), 64, 64), y1_test)
# compare number of white pixels per class
discrete_train = np.where(X_train / 255. > 0.5, 1, 0)
white_pixel_counts = np.apply_along_axis(lambda img: np.count_nonzero(img),
1, discrete_train.reshape(12000, 64 * 64) / 255.)
count_df = pd.DataFrame({'label': pd.Series(y1_train.T[0]),
'white_pixel_count': pd.Series(white_pixel_counts)})
count_df.groupby('label').mean()
| white_pixel_count | |
|---|---|
| label | |
| 1.0 | 240.082333 |
| 2.0 | 460.734333 |
| 4.0 | 884.890000 |
| 5.0 | 797.058667 |
Ideally, the average white pixel count for each class should be similar. This requires zooming onto the object regions, especially for one and two stars.
# some inspection
# retrieve 5 images for each class from train
from collections import defaultdict
counts = defaultdict(int)
index = 0
images = defaultdict(list)
while sum(counts.values()) != 20:
label = int(y1_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{1: [0, 1, 2, 3, 4],
2: [3000, 3001, 3002, 3003, 3004],
4: [6000, 6001, 6002, 6003, 6004],
5: [9000, 9001, 9002, 9003, 9004]})
fig, axs = plt.subplots(4, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
if index < 10:
label = index // 5 + 1
else:
label = (index + 5) // 5 + 1
image_index = images[label].pop(0)
image = X_train[image_index]
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
X_train = X_train.reshape(12000, 64, 64, 1)
X_test = X_test.reshape(5000, 64, 64, 1)
img_shape_full = (64, 64, 1)
# test on CNN with same configuration as hw4 Q6
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import InputLayer, Input
from tensorflow.python.keras.layers import Reshape, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.python.keras.layers import Conv2D, Dense, Flatten
model = Sequential()
model.add(Conv2D(kernel_size=5, strides=1, filters=32, padding='same',
activation='relu', name='layer_conv1', input_shape=img_shape_full))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Conv2D(kernel_size=5, strides=1, filters=32, padding='same',
activation='relu', name='layer_conv2'))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
# To do regression, use a linear activation for the final dense layer!
model.add(Dense(1, activation='linear'))
model.compile(optimizer='adam',
loss='mse',
metrics=['mae'])
history = model.fit(x=X_train,
y=y_train,
epochs=3, batch_size=100)
Epoch 1/3 1/120 [..............................] - ETA: 33s - loss: 68.5182 - mae: 6.9409
2022-05-09 00:00:44.499305: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
120/120 [==============================] - 3s 19ms/step - loss: 234.0646 - mae: 2.2617 Epoch 2/3 120/120 [==============================] - 2s 19ms/step - loss: 0.1470 - mae: 0.2933 Epoch 3/3 120/120 [==============================] - 2s 19ms/step - loss: 0.1046 - mae: 0.2447
# evaluation
result = model.evaluate(X_test, y_test, batch_size=32, verbose=1)
for name, value in zip(model.metrics_names, result):
print(name, value)
22/157 [===>..........................] - ETA: 0s - loss: 0.0890 - mae: 0.2282
2022-05-09 00:00:54.395959: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
157/157 [==============================] - 1s 6ms/step - loss: 0.3365 - mae: 0.3868 loss 0.33653557300567627 mae 0.38675278425216675
# compare accuracy
# train accuracy
train_pred = model.predict(x=X_train)
train_pred = np.array(np.round(train_pred.T[0]), dtype=int)
train_accuracy = sum(train_pred == y_train.T[0]) / len(train_pred)
# test accuracy
test_pred = model.predict(x=X_test)
test_pred = np.array(np.round(test_pred.T[0]), dtype=int)
test_accuracy = sum(test_pred == y_test.T[0]) / len(test_pred)
print(f'Train accuracy: {train_accuracy}')
print(f'Test accuracy: {test_accuracy}')
2022-05-09 00:00:56.551655: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 2022-05-09 00:00:57.856361: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Train accuracy: 0.247 Test accuracy: 0.1938
The performance on this dataset is significantly worse compare to original dataset. Performance on the original dataset is 0.89 on train and 0.82 on test.
This is expected from a computational perspective, but not expected from a abstract concept learning perspective. If someone gets the abstract concept of a star shape, or just a shape in general, it is expected that they are able to count the number of shapes even when images are zoomed in.
Task: given image, predict the number of tips that the single "pointy star-like object" in the image has.
# load data
x2_train = np.load('../../hw/Homework4/Q6_2_x_train.npy')
y2_train = np.load('../../hw/Homework4/Q6_2_y_train.npy')
x2_test = np.load('../../hw/Homework4/Q6_2_x_test.npy')
y2_test = np.load('../../hw/Homework4/Q6_2_y_test.npy')
y2_train_cat = tf.keras.utils.to_categorical(y2_train)
y2_test_cat = tf.keras.utils.to_categorical(y2_test)
x2_train.shape, y2_train.shape, x2_test.shape, y2_test.shape
((15000, 64, 64, 1), (15000, 1), (5000, 64, 64, 1), (5000, 1))
Data visualization
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y2_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
Construct CNN as in Homework 4
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import InputLayer, Input
from tensorflow.python.keras.layers import Reshape, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.python.keras.layers import Conv2D, Dense, Flatten
model = Sequential()
model.add(InputLayer(input_shape=(64,64,1),))
model.add(Conv2D(kernel_size=11, strides=4, filters=64, padding='same',
activation='relu', name='layer_conv1'))
model.add(MaxPooling2D(pool_size=3, strides=2))
model.add(Conv2D(kernel_size=5, strides=1, filters=128, padding='same',
activation='relu', name='layer_conv2'))
model.add(MaxPooling2D(pool_size=3, strides=2))
model.add(Conv2D(kernel_size=3, strides=1, filters=256, padding='same',
activation='relu', name='layer_conv3'))
model.add(Conv2D(kernel_size=3, strides=1, filters=256, padding='same',
activation='relu', name='layer_conv4'))
model.add(Conv2D(kernel_size=3, strides=1, filters=256, padding='same',
activation='relu', name='layer_conv5'))
model.add(MaxPooling2D(pool_size=3, strides=2))
model.add(Flatten())
# First fully-connected / dense layer with ReLU-activation.
model.add(Dense(1024, activation='relu'))
# Second fully-connected / dense layer with ReLU-activation.
model.add(Dense(512, activation='relu'))
# Last fully-connected / dense layer with softmax-activation
# for use in classification.
model.add(Dense(5, activation='softmax'))
Metal device set to: Apple M1 Pro
2022-05-11 13:12:41.555279: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-05-11 13:12:41.555645: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
train_result = model.fit(x2_train,y2_train_cat,epochs=12,batch_size=100)
2022-05-11 13:12:44.352433: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz 2022-05-11 13:12:44.539318: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
Epoch 1/12 150/150 [==============================] - 3s 20ms/step - loss: 1.5910 - accuracy: 0.2823 Epoch 2/12 150/150 [==============================] - 3s 20ms/step - loss: 0.8094 - accuracy: 0.6368 Epoch 3/12 150/150 [==============================] - 3s 20ms/step - loss: 0.4365 - accuracy: 0.8161 Epoch 4/12 150/150 [==============================] - 3s 20ms/step - loss: 0.3026 - accuracy: 0.8815 Epoch 5/12 150/150 [==============================] - 3s 20ms/step - loss: 0.2142 - accuracy: 0.9181 Epoch 6/12 150/150 [==============================] - 3s 20ms/step - loss: 0.1743 - accuracy: 0.9361 Epoch 7/12 150/150 [==============================] - 3s 20ms/step - loss: 0.1613 - accuracy: 0.9405 Epoch 8/12 150/150 [==============================] - 3s 20ms/step - loss: 0.1261 - accuracy: 0.9532 Epoch 9/12 150/150 [==============================] - 3s 20ms/step - loss: 0.1132 - accuracy: 0.9605 Epoch 10/12 150/150 [==============================] - 3s 20ms/step - loss: 0.0959 - accuracy: 0.9663 Epoch 11/12 150/150 [==============================] - 3s 20ms/step - loss: 0.0902 - accuracy: 0.9687 Epoch 12/12 150/150 [==============================] - 3s 20ms/step - loss: 0.0926 - accuracy: 0.9657
# evaluate
result = model.evaluate(x=x2_test,
y=y2_test_cat)
for name, value in zip(model.metrics_names, result):
print(name, value)
16/157 [==>...........................] - ETA: 1s - loss: 0.0287 - accuracy: 0.9883
2022-05-11 13:13:28.733055: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
157/157 [==============================] - 1s 8ms/step - loss: 0.2688 - accuracy: 0.9260 loss 0.26875901222229004 accuracy 0.9260000586509705
The model does a decent job on this task with a 0.96 training accuracy and a 0.92 testing accuracy.
layer_conv1 = model.layers[0]
weights_conv1, bias_conv1 = layer_conv1.get_weights()
weights_conv1.T.shape, bias_conv1.shape
((64, 1, 11, 11), (64,))
Let's take a look what the first layer of filters are learning.
fig, axs = plt.subplots(4, 8, figsize=(9,6),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
ax.imshow(weights_conv1.T[index][0], cmap='gray', interpolation='gaussian')
plt.tight_layout()
plt.show()
Second layer.
layer_conv2 = model.layers[2]
weights_conv2, bias_conv2 = layer_conv2.get_weights()
weights_conv2.T.shape, bias_conv2.shape
((128, 64, 5, 5), (128,))
fig, axs = plt.subplots(4, 8, figsize=(7,7),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
w = weights_conv2.T[3][index]
ax.imshow(w, interpolation='gaussian', cmap='gray')
plt.tight_layout()
plt.show()
The third to last filter in the last row resembles a tip.
layer_conv5 = model.layers[6]
weights_conv5, bias_conv5 = layer_conv5.get_weights()
weights_conv5.T.shape, bias_conv5.shape
((256, 256, 3, 3), (256,))
fig, axs = plt.subplots(4, 8, figsize=(9,9),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
w = weights_conv5.T[2][index]
ax.imshow(w, interpolation='gaussian', cmap='gray')
plt.tight_layout()
plt.show()
Even in the last layer of filters, it doesn't seem like they are directly learning a template/the concept of a tip.
Similar to previous part, I will take a look at the number of white pixels for each class.
x2_test.shape
(5000, 64, 64, 1)
# remove last dimension
X_train = x2_train.reshape((15000, 64, 64))
X_test = x2_test.reshape((5000, 64, 64))
# normalize
X_train = X_train / 255.
X_test = X_test / 255.
# simple thresholding to identify object
# Extension: could use better object detection method
# but thresholding suffice for now
X_train = np.where(X_train > 0.5, 1, 0)
X_test = np.where(X_test > 0.5, 1, 0)
white_pixel_counts = np.apply_along_axis(lambda img: np.count_nonzero(img),
1, X_test.reshape(5000, 64 * 64))
count_df = pd.DataFrame({'label': pd.Series(y2_test.T[0]),
'white_pixel_count': pd.Series(white_pixel_counts)})
count_df.groupby('label').mean()
| white_pixel_count | |
|---|---|
| label | |
| 0.0 | 95.423 |
| 1.0 | 139.207 |
| 2.0 | 166.739 |
| 3.0 | 181.359 |
| 4.0 | 191.420 |
# plot mean for each class
count_df.groupby('label').mean().plot(kind='bar')
plt.show()
# box plot for each class
count_df.boxplot(column='white_pixel_count', by='label', figsize=(12,9))
plt.show()
The number of white pixels in each image can be an important signal for its label.
Recall that in homework 4, CNN does a terrible job in the final task of counting the number of 5-tip stars in each image.
Task: given image, predict the number of stars that have exactly 5 tips.
# load data
x3_train = np.load('../../hw/Homework4/Q6_3_x_train.npy')
y3_train = np.load('../../hw/Homework4/Q6_3_y_train.npy')
x3_test = np.load('../../hw/Homework4/Q6_3_x_test.npy')
y3_test = np.load('../../hw/Homework4/Q6_3_y_test.npy')
Data Visualization
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y3_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x3_train[image_index]
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
Let's take a look at the number of white pixels per class.
# remove last dimension
X_train = x3_train.reshape((15000, 64, 64))
X_test = x3_test.reshape((5000, 64, 64))
# normalize
X_train = X_train / 255.
X_test = X_test / 255.
# simple thresholding to identify object
# Extension: could use better object detection method
# but thresholding suffice for now
X_train = np.where(X_train > 0.5, 1, 0)
X_test = np.where(X_test > 0.5, 1, 0)
white_pixel_counts = np.apply_along_axis(lambda img: np.count_nonzero(img),
1, X_test.reshape(5000, 64 * 64))
count_df = pd.DataFrame({'label': pd.Series(y3_test.T[0]),
'white_pixel_count': pd.Series(white_pixel_counts)})
count_df.groupby('label').mean()
| white_pixel_count | |
|---|---|
| label | |
| 0.0 | 643.209 |
| 1.0 | 671.112 |
| 2.0 | 688.709 |
| 3.0 | 709.983 |
| 4.0 | 731.282 |
# plot mean for each class
count_df.groupby('label').mean().plot(kind='bar')
plt.show()
# box plot for each class
count_df.boxplot(column='white_pixel_count', by='label', figsize=(12,9))
plt.show()
The number of white pixels are a lot more similar compared to previous class. Thus, this may not be a useful feature for CNN in classification.
Learning the Concept of A Tip
In previous section, we see that CNNs are highly likely to rely on other features in the tasks of counting number of tips and counting number of stars with 5 tips rather than learning what a tip is.
In this section, we will explore learning the concept a tip directly.
First, we will examine some available filters that can potentially be useful.
skimage
from skimage import filters
img = x2_train[0]
plt.imshow(img, cmap='gray')
plt.grid(None)
plt.show()
# edge magnitude
# farid, farid_h, farid_v
new_img = filters.farid(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# continuous ridges
# meijering, frangi, sato
new_img = filters.meijering(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# edge magnitude
# prewitt, scharr, sobel
new_img = filters.prewitt(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# edge magnitude
# roberts, roberts_pos_diag, roberts_neg_diag
new_img = filters.roberts_pos_diag(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
cv2
import cv2
Corner Counting Approach
# coner detection on original image
dst = cv2.cornerHarris(img,2,3,0.04)
dst = cv2.dilate(dst,None)
# threshold
new_img = np.array(img, copy=True)
new_img[dst>0.5*dst.max()]=[127] # mark corner using gray
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
The corner detector seem to be detecting corners successfully. Now let's try it on denoised image.
# binary thresholding
# img, threshold, max value
ret, new_img = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY)
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# detect corner
new_img = new_img.reshape(64,64,1)
dst = cv2.cornerHarris(new_img,5,3,0.04) # neighborhood size, ksize, k
dst = cv2.dilate(dst,None)
# threshold
copy_img = np.array(new_img, copy=True)
copy_img[dst>0.5*dst.max()]=[127]
plt.imshow(copy_img,cmap='gray')
plt.grid(None)
plt.show()
Plot corner detection for all classes.
def detect_corner(img, blockSize=2,ksize=3,k=0.04,on_original=True,thresh=0.5):
# coner detection on original image
if not on_original:
# binary thresholding
ret, img = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY)
dst = cv2.cornerHarris(img,blockSize,ksize,k)
dst = cv2.dilate(dst,None)
# threshold
new_img = np.array(img, copy=True)
new_img[dst>thresh*dst.max()]=[150] # mark corner using gray
return new_img
Visualize corners
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y2_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
# on original image
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
image = detect_corner(image, blockSize=3,ksize=3,k=0.04,on_original=True,thresh=0.5)
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
# on edge only
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
image = detect_corner(image, blockSize=5,ksize=3,k=0.04,on_original=False,thresh=0.2)
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
We see that the detected corners are quite noisy, especially for stars with more tips.
The main challenge is to pick a threshold to determine what is considered a corner. If the labeled corner is too big with a small threshold, they may fuse together for stars with more tips. On the other hand, with a larger threshold some smaller tips may not be detected.
Ridge Counting Approach
1) Counting white ridges
# meijering, frangi, sato
img = x2_train[0]
img = img / 255.
new_img = filters.frangi(img.reshape(64,64), black_ridges=False)
#new_img = filters.frangi(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# threshold output image to dsicretize ridges
copy_img = np.array(img, copy=True)
copy_img[new_img<0.57*new_img.max()]=[0.35]
plt.imshow(copy_img, cmap='gray', interpolation='gaussian')
plt.grid(None)
plt.show()
plt.imshow(np.where(new_img < 0.55 * new_img.max(), 0, 1), cmap='gray')
plt.grid(None)
plt.show()
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y2_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
image = filters.frangi(image.reshape(64,64), black_ridges=False)
#image = filters.frangi(image.reshape(64,64))
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
Instead of counting tips, we could count the number of ridges.
From these plots, we see that the tips can be detected.
def detect_ridges(img, threshold=0.5, black=False):
img = img / 255.
if not black:
new_img = filters.frangi(img.reshape(64,64), black_ridges=False)
else:
new_img = filters.frangi(img.reshape(64,64))
# threshold output image to dsicretize ridges
# copy_img = np.array(img, copy=True)
# copy_img[new_img>threshold*new_img.max()]=[0.5] # mark corner using gray
img = np.where(new_img < threshold * new_img.max(), 0, 1)
# return copy_img
return img
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y2_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
image = detect_ridges(image, 0.65)
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
Picking the threshold is again very challenging.
2) Counting black ridges
# meijering, frangi, sato
img = x2_train[0]
img = img / 255.
#new_img = filters.frangi(img.reshape(64,64), black_ridges=False)
new_img = filters.frangi(img.reshape(64,64))
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# retrieve 5 images for each class
from collections import defaultdict
counts = [0] * 5
index = 0
images = defaultdict(list)
while sum(counts) != 25:
label = int(y2_train[index][0])
if counts[label] != 5:
counts[label] += 1
# save index
images[label].append(index)
index += 1
images
defaultdict(list,
{0: [0, 1, 2, 3, 4],
1: [3000, 3001, 3002, 3003, 3004],
2: [6000, 6001, 6002, 6003, 6004],
3: [9000, 9001, 9002, 9003, 9004],
4: [12000, 12001, 12002, 12003, 12004]})
fig, axs = plt.subplots(5, 5, figsize=(12,12),
subplot_kw={'xticks': [], 'yticks': []})
for index, ax in enumerate(axs.flat):
label = index // 5
image_index = images[label].pop(0)
image = x2_train[image_index]
image = detect_ridges(image, 0.75, True)
ax.imshow(image, cmap='gray')
plt.tight_layout()
plt.show()
Also notice that the performance is terrible when the star is located near the boundaries of the image.
Finally, let's see how corner and edge detection performs on dataset 3.
img = x3_train[0]
plt.imshow(img, cmap='gray')
plt.grid(None)
plt.show()
# corner on original
new_img = detect_corner(img, blockSize=3,ksize=3,k=0.04,on_original=True,thresh=0.5)
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# corner on edge
new_img = detect_corner(img, blockSize=3,ksize=3,k=0.04,on_original=False,thresh=0.5)
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# white ridge
new_img = detect_ridges(img, threshold=0.5, black=False)
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
# black ridge ridge
new_img = detect_ridges(img, threshold=0.5, black=True)
plt.imshow(new_img, cmap='gray')
plt.grid(None)
plt.show()
We can see that detecting black ridges is disadvantageous here because the stars are more compact in this dataset.
Overall, detecting corners directly seem better qualitatuvely.
Detecting white ridges is likely to merge the ridges together at the center of the star and detecting black ridges is likely to merge among the starts if multiple stars are present in the image.
However, detecting corners is not doing wonderfully as it is difficult to accomodate tips of different angle from stars with different number of tips.